Genres In Formation? An Exploratory Study of Web Pages using Cluster Analysis
نویسنده
چکیده
The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instance, online newspapers and online manuals). Also, there is a family of genres, which have been created specifically for the Web, e.g. home pages, splash screens, newsletters, hotlists. Besides these, are there other emerging genres on the Web for which a genre label has not been coined yet? Is it possible to capture genres in formation in an automated way? An experiment using cluster analysis has been set up to provide initial answers to these questions. Results show that the main clusters have a shape which is quite well-defined and show a number of regularities. Interestingly, Web pages appear to have been clustered according to their rhetorical/discoursal types (informational, instructional, argumentative, etc.), rather than genre classes (e.g. sermons and editorials, both argumentative, belong to the same cluster). The perception of rhetorical/discoursal types in Web pages has been confirmed by a small-scale Web user study.
منابع مشابه
Future Retrieval : What Does the Future Talk About ? ∗
Predicting the future has always been one of the main aims of human beings in order to adapt their behavior and maximize their chances of success. With the advent of the Web, which indexes a wealth of temporal information, a great number of research have been proposed in the area of Temporal Information Retrieval, but Future Retrieval has remained a difficult problem to handle. In this paper, w...
متن کاملبررسی ارتباط بین کیفیت اطلاعات و شاخص های ظاهری در صفحات وب فارسی مرتبط با حوزه سلامت عمومی
Introduction: One approach to evaluate the quality of a web page is to investigate its external markers. The purpose of the present study is to determine the relationship between information quality of Persian public health web pages and their external quality. Methods: The samples of this correlation study were selected from among the freely available ten-key word texts of chronic diseases...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملIdentifying Genres of Web Pages
In this paper, we present an inferential model for text type and genre identification of web pages, where text types are inferred using a modified form of Bayes’ theorem, and genres are derived using a few simple if-then rules. As the genre system on the web is a complex reality, and web pages are much more unpredictable and individualized than paper documents, we propose this approach as an al...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005